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Abstract: We survey what is known about the information transmitting capacities of 
quantum channels, and give a proposal for how to calculate some of these capacities 
using linear programming. 

1 Introduction 

In this paper, we discuss the capacity of quantum channels. Information theory says 
that the capacity of a classical channel is essentially unique, and is representable as 
a single numerical quantity, which gives the amount of information that can be trans- 
mitted asymptotically per channel use |46, 15 1. Quantum channels, unlike classical 
channels, do not have a single numerical quantity which can be defined as their ca- 
pacity for transmitting information. Rather, quantum channels appear to have at least 
four different natural definitions of capacity, depending on the auxiliary resources al- 
lowed, the class of protocols allowed, and whether the information to be transmitted is 
classical or quantum. 

In this paper, we first introduce the background necessary for understanding the 
capacity of quantum channels, and then define several capacities of these channels. 
For two of these channel capacities, we sketch possible techniques for computing them 
which we believe will be more efficient than techniques currently used. These ca- 
pacities are both reducible to optimization problems over matrices. We beUeve that 
a combination of linear programming techniques, including column generation, and 
non-linear optimization will provide a more efficient method for calculating these ca- 
pacities. Unfortunately, at the time of writing this paper, I have not yet tested these 
techniques experimentally. Since I cannot prove that these techniques are efficient, the 
proof of this pudding must be in the computing, and is thus not yet demonstrated. We 
hope to test these techniques in the near future. 

To date, the means used for numerical computations of quantum channel capacities 
have been fairly straightforward, often using gradient descent techniques |40J. More 
research has been done on the calculation of the entanglement of formation I51l l4l. 
a related problem |36|. None of these programs have used combinatorial optimiza- 
tion techniques. For one of the capacities discussed in this paper — the entanglement- 
assisted capacity — this technique may be fairly efficient, as this capacity has a single 
local optimum which is also a global optimum. For two other capacities discussed in 
this paper — the Ci i and Ci ^o capacities — I propose techniques involving linear pro- 
gramming that could be used for the capacity computation, and which I suspect are 
much more efficient than straightforward optimization. For another capacity — the one- 
way quantum capacity — there are multiple local maxima in the optimization problem, 
and we need to determine the global maximum. In this case, unfortunately, although 
hill climbing does not seem like it would be efficient, I do not have any alternative 
techniques to suggest. 
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This paper originates in my research investigating the capacities of a quantum chan- 
nel f48 1. In order to show that a certain channel capacity (which I do not deal with in 
this paper; it is less natural than the capacities covered here) lies strictly between two 
other channel capacities, I needed to calculate some of these capacities. Specifically, I 
needed to calculate what I call the Ci.i capacity of a fairly simple quantum channel. I 
realized that this was a problem which could be solved numerically using linear pro- 
gramming, and I used this technique to obtain a picture of the Ci^i capacity landscape 
which was satisfactory for my application. During this computation, it became clear 
that a better way to solve this problem would be to use column generation techniques to 
make the linear program more efficient, and that these would furthermore also be useful 
for computing other capacities of quantum channels. I have not yet had time to exper- 
imentally test these new techniques (rather, my program started with enough columns 
to ensure obtaining a close approximation of the capacity; this would be an enormous 
waste of resources for larger problems, but for my purposes it was quite adequate). 
This paper will explain the column generation technique. I will try to make it com- 
prehensible both to researchers with background in mathematical programming and 
to researchers with background in quantum information theory. Those wishing more 
background information on linear programming, on quantum computing and informa- 
tion, or on classical information theory can find them in textbooks such as 1 14 38 1151 
More specifically, I will give proposals for how to compute two capacities for carrying 
classical information over a quantum channel: namely, the Ci i capacity and the Ci oo 
capacity. These techniques should also work for computing a formula that I conjec- 
ture gives the classical entanglement-assisted capacity with limited entanglement; this 
extrapolates between the Ci,oo capacity and the entanglement-assisted capacity. The 
description of quantum information theory and capacities contained in here is largely 
taken from the paper L47J . 

2 Quantum Information Theory 

The discipline of information theory was founded by Claude Shannon in a truly remark- 
able paper |46 1 which laid down the foundations of the subject. We begin with a quote 
from this paper which a nutshell summarizes one of the main concerns of information 
theory: 

The fundamental problem of communication is that of reproducing at one 
point either exactly or approximately a message selected at another point. 

This paper proposed the definition of the capacity of a classical channel as the amount 
of information per channel use that can be transmitted asymptotically in the limit of 
many channel uses, with near perfect reproduction at the receiver's end, and gave a 
simple and elegant formula for the capacity. Here, the information is the logarithm 
(base 2) of the number of messages, in other words the number of classical bits that 
can be transmitted by the channel. 

The definition of quantum channel capacity is motivated largely by the same prob- 
lem, with the difference being that either the method of reproduction or the message 
itself involves fundamentally quantum effects. For many years, information theorists 
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either ignored quantum effects or approximated them so as to make them susceptible 
to classical analysis; it was only in the last decade or so that the systematic study of 
quantum information theory began. 

Shannon's original paper set forth two coding theorems which form the foundation 
of the field of information theory. The first is the source coding theorem, which gives 
a formula for how much a random information source can be compressed. The second 
is the channel coding theorem, which gives a formula for how much redundancy must 
be added to a message in order to accurately reproduce it after sending the information 
through a noisy channel. 

3 Shannon theory 

Shannon's 1948 paper II46I contained two theorems for which we give quantum analogs. 
The first of these is the source coding theorem, which gives a formula for how much a 
source emitting random signals can be compressed, while still permitting the original 
signals to be recovered with high probability. Shannon's source coding theorem states 
that n outputs of a source X can be compressed to length nH{X) + o{n) bits, and 
restored to the original with high probability, where H is the entropy function. For a 
probability distribution with probabilities pi, p2, ■ ■ ., Pn, the entropy H is 

n 

H{{p,})^Y.-P^^''SP^, (1) 
i=l 

where information theorists generally take the logarithm base 2 (thus obtaining bits as 
the unit of information). 

The second of these theorems is the channel coding theorem, which states that with 
high probability, n uses of a noisy channel N can communicate Cn—o{n) bits reliably, 
where C is the channel capacity given by 

C = max I{X;N{X)) (2) 

Here the maximum is taken over all probability distributions on inputs X to the chan- 
nel, and N{X) is the output of the channel given input X. The mutual information I 
between two random variables X and Y is defined as: 

I{X-Y) = H{Y)-H(Y\X) (3) 
= H{X) + H{Y)-H{X,Y), (4) 

where H{X, Y) is the entropy of the joint distribution of X and Y, and H{Y\X) is the 
conditional entropy of Y, given X. That is, if the possible values of X are {Xi}, then 
the conditional entropy is 

i/(r|x) = ^Pr(x = xOiJ(r|x = x,). (5) 

i 

There is an efficient algorithm, the Arimoto-Blahut algorithm, for calculating the ca- 
pacity (E) of a classical channel lEl fni[T31B^ . 
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When the formula for mutual information is extended to the quantum case, two 
generalizations have been found that both give capacities of a quantum channel, al- 
though these capacities differ in both the resources that the sender and receiver have 
available and the operations they are permitted to carry out. One of these formulae 
generalizes the expression (|3jl and the other the expression 0; these expressions are 
equal in the classical case. 

For classical channels, there are a number of extra resources which one might imag- 
ine could increase their capacity. These include a feedback channel from the receiver 
to the sender and shared randomness between the sender and the receiver It turns out 
that neither of these resources actually does increase the capacity of a classical chan- 
nel. For quantum channels, however, the situation is different. In this case, one of the 
resources that we must consider is entanglement. An entangled pair of quantum states 
consists of two states which are non-classically correlated. To parties who share such 
a pair of states cannot use them to transmit information, but can use them to obtain 
a shared random variable. It turns out that shared entanglement between the sender 
and receiver can be used to increase the transmission capacity of a quantum channel. 
When the capacity of a quantum channel for transmitting quantum information is con- 
sidered, things become even more complicated. In this case, a classical side channel 
can increase the capacity of a quantum channel to transmit quantum information, even 
though no quantum information can be transmitted by a classical channel. 

4 Quantum mechanics 

Before we can start talking about quantum information theory, I need to give a brief 
description of some of the fundamental principles of quantum mechanics. The first 
of these principles that we present is the superposition principle. In its most basic 
form, this principle says that if a quantum system can be in one of two distinguishable 
states I x) and \y), it can be in any state of the form a\x) + (3\y), where a and 
f3 are complex numbers with jap + = 1. Here | •) is the bra-ket notation that 
physicists use for a quantum state; we will occasionally be using it in the rest of this 
paper Recall that we assumed that | x) and | y) were distinguishable, so there must 
conceptually be some physical experiment which distinguishes them (this experiment 
need not be performable in practice). The principle says further that if we perform this 
experiment, we will observe the state | x) with probability jap and | y) with probability 
Furthermore, after this experiment is performed, if state | x) (or | y)) is observed 
the system will thereafter behave in the same way as it would have had it originally 
been in state | x) (or | y)). 

Mathematically, the superposition principle says that the states of a quantum sys- 
tem are the unit vectors of a complex vector space, that two orthogonal vectors are 
distinguishable, and that measurement projects the state onto one of an complete or- 
thonormal set of basis vectors. In accordance with physics usage, we will represent 
quantum states by column vectors. The Dirac bra-ket notation denotes a column vector 
by I v) (a ket) and its Hermitian transpose (i.e., complex conjugate transpose) by {v \ (a 
bra). The inner product between two vectors, v and w, is denoted {w\v) — w^v, here 
we define (whether X is a vector or matrix) to be the Hermitian transpose of X. 
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Multiplying a quantum state vector by a complex phase factor (a unit complex number) 
does not change any properties of the system, so mathematically the state of a quantum 
system is a point in projective complex space. Unless otherwise stated, however, we 
will represent quantum states as unit vectors in some complex vector space C*. 

We will be dealing solely with finite dimensional vector spaces. For an introductory 
paper, quantum information theory is already complicated enough in finite dimensions 
without introducing the additional complexity of infinite-dimensional vector spaces. 
Many of the theorems we discuss do indeed generalize naturally to infinite-dimensional 
spaces. 

A quhit is a two-dimensional quantum system. Probably the most widely known 
qubit is the polarization of a photon, and we will thus be using this example. For the 
polarization of a photon, there can only be two distinguishable states. If one sends a 
photon through a birefringent crystal, it will take one of two paths, depending on its 
polarization. By re-orienting this crystal, these two distinguishable polarization states 
can be chosen to be horizontal and vertical, or right diagonal and left diagonal. In 
accordance with the superposition principle, each of these states can be expressed as a 
complex combination of basis states in the other basis. For example. 

Here, | ^) and | C^) stand for right and left circularly polarized light, respectively; 
these are another pair of basis states for the polarization of photons. For a specific 
example, when diagonally polarized photons are put through a birefringent crystal ori- 
ented in the |, ^ direction, half of them will behave like vertically polarized photons, 
and half like horizontally polarized photons; thereafter, these photons will indeed have 
these polarizations. 

If you have two quantum systems, their joint state space is the tensor product of 
their individual state spaces. For example, the state space of two qubits is and of 
three qubits is C^. The high dimensionality of the space for n qubits, , is one of 
the places where quantum computation attains its power. 

The polarization state space of two photons has as a basis the four states 

in>, i:->, i-i), i-->- 

This state space includes states such as an EPR (Einstein, Podolsky, Rosen) pair of 
photons 

^(II-)-|-I)) = ^(l/" \ )-|\ /")), (6) 

where neither qubit alone has a definite state, but which has a definite state when con- 
sidered as a joint system of two qubits. In this state, the two photons have orthogonal 
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polarizations in whichever basis they are measured in. Bell ^6^ showed that the out- 
comes of measurements on the photons of this state cannot be reproduced by joint 
probability distributions which give probabilities for the outcomes of all possible mea- 
surements, and in which each of the single photons has a definite probability distribu- 
tion for the outcome of measurements on it, independent of the measurements which 
are made on the other photon ll6l 1201 . In other words, there cannot be any set of hid- 
den variables associated with each photon that determines the probability distribution 
obtained when this photon is measured in any particular basis. Two quantum systems 
such as an EPR pair which are non-classically correlated are said to be entangled 1131 . 

The next fundamental principle of quantum mechanics we discuss is the linearity 
principle. This principle states that an isolated quantum system undergoes linear evo- 
lution. Because the quantum systems we are considering are finite dimensional vector 
spaces, a linear evolution of these can be described by multiplication by a matrix. It is 
fairly easy to check that in order to make the probabilities sum to one, we must restrict 
these matrices to be unitary (a matrix U is unitary if W = U^^; unitary matrices are 
those complex matrices which take unit vectors to unit vectors). 

Although many elementary treatments of quantum mechanics restrict themselves to 
pure states (unit vectors), for quantum information theory we need to treat probability 
distributions over quantum states. These naturally give rise to objects called density 
matrices. For an 7i-dimensional quantum state space, a density matrix is an n x n 
Hermitian trace-one positive semidefinite matrix. 

Density matrices arise naturally from quantum states in two ways. The first way in 
which density matrices arise is from probability distributions over quantum states. A 
rank one density matrix p corresponds to the pure state v where p = vv^ . (Recall 
was the Hermitian transpose of v.) Suppose that we have a system which is in state Vi 
with probability pi. The corresponding density matrix is 



An important fact about density matrices is that the density matrix for a system 
gives as much information as it is possible to obtain about experiments performed 
on the system. That is, any two systems with the same density matrix p cannot be 
distinguished by experiments, provided that no extra side information is given about 
these systems. 

The other way in which density matrices arise is through disregarding part of an 
entangled quantum state. Recall that two systems in an entangled pure state have a 
definite quantum state when considered jointly, but that neither of the two systems 
individually can be said to have a definite state. The state of either of these systems 
considered separately is naturally represented by a density matrix. Suppose that we 
have a state pab on a tensor product system Ha '^'Hb- If we can only see the first part 
of the system, this part behaves as though it is in the state pA — Tr^ pab ■ Here, Tr b 
is the partial trace operator. Consider a joint system in the state 




(7) 




(8) 



6 



In this example, the dimension of Ha is 3 and the dimension of is the size of the 
matrices Bij. The partial trace of pab, tracing over Ha, is 

TiA PAB = Bn + B22 + -B33 (9) 

Although the above formula also determines the partial trace when we trace over Hb, 
through a permutation of the coordinates, it is instructive to give this explicitly: 

TrSn TrBi2 TrBig 
Tib pab =\ B^i Ti B22 Tr^aa |. (10) 
TVB32 TVB33 

The final ingredient we need before we can start explaining quantum information 
theory is a von Neumann measurement. We have seen examples of this process before, 
while explaining the superposition principle; however, we have not yet given the gen- 
eral mathematical formulation of a von Neumann measurement. Suppose that we have 
an n-dimensional quantum system H. A von Neumann measurement corresponds to a 
complete set of orthogonal subspaces ^i, S2, ■ ■ ■, Sk of H. Here, complete means that 
the subspaces Si span the space H, so that J2i dim Si = n. Let Ilj be the projection 
matrix onto the subspace Si. If we start with a density matrix p, the von Neumann 
measurement corresponding to the set of subspaces {Si} projects p into one of the sub- 
spaces Si. Specifically, it projects p onto the i'th subspace with probability Tr Uip, the 
state after the projection being 

Trlljp 

where we have renormalized the projection to have trace 1. A special case that is 
often encountered is when the Si are all one-dimensional, so that Si = Wiwj, and 
the vectors Wi form an orthogonal basis of H. Then, a vector v is taken to Wi with 
probability and a density matrix p is taken to Wi-wj with probability w\pwi. 



5 Von Neumann entropy 

We are now ready to consider quantum information theory. We will start by defining 
the entropy of a quantum system. To give some intuition for this definition, we first 
consider some special cases. Consider n photons, each being in the state 1 1) or | ^) 
with probability \. Any two of these states are completely distinguishable. There 
are thus 2" equally probable states of the system, and the entropy is n bits. This is 
essentially a classical system. 

Consider now n photons, each being in the state 1 1) or | with probability |. 
These states are not completely distinguishable, so there are effectively considerably 
less than 2" states, and the entropy should intuitively be less than n bits. 

By thermodynamic arguments involving the increase in entropy associated with the 
work extracted from a system, von Neumann deduced that the (von Neumann) entropy 
of a quantum system with density matrix p should be 

ifvN(p) = -Trplogp. (11) 
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Recall that p is positive semidefinite, so that — Trplogp is well defined. If p is ex- 
pressed in coordinates in which it is diagonal with eigenvalues A^, then in these coor- 
dinates — plogp is diagonal with eigenvalues — Ai log A^. We thus see that 



so that the von Neumann entropy of a density matrix is the Shannon entropy of the 
eigenvalues. (Recall Trp = 1, so that Ai — 1.) This definition is easily seen to 
agree with the Shannon entropy in the classical case, where all the states are distin- 
guishable. 

6 Source coding 

Von Neumann developed the above definition of entropy for thermodynamics. One can 
ask whether this is also the correct definition of entropy for information theory. We 
will first give the example of quantum source coding II30II44I . also called Schumacher 
compression, for which we will see that it is indeed the right definition. We consider 
a memoryless quantum source that at each time step emits the pure state Vi with prob- 
ability Pi. We would like to encode this signal in as few qubits as possible, and send 
them to a receiver who will then be able to reconstruct the original state. Naturally, 
we will not be able to transmit the original state flawlessly. In fact, the receiver can- 
not even reconstruct the original state absolutely perfectly most of the time (this is the 
corresponding requirement in classical information theory). Unlike classical signals, 
quantum states are not completely distinguishable theoretically, so reconstructing the 
original state most of the time is too stringent a requirement. What we require is that 
the receiver be able to reconstruct a state which is almost completely indistinguishable 
from the original state nearly all the time. For this we need a measure of indistin- 
guishability; we will use a measure c?i\\td fidelity . Suppose that the original signal is a 
vector 



Then the fidelity between the signal u and the output p (which is in general a mixed 
state, i.e., a density matrix, on n qubits) is = v) pu. The average fidelity is this fi- 
delity F averaged over u. If the output is also a pure state v, the fidelity F — u^vv'^u = 
jw^wp. If the input is a pure state, the fidelity measures the probability of success of a 
test which determines whether the output is the same as the input. If both the output 
state Pout and the input state pi„ are mixed states, the fidelity is defined 



an expression which, despite its appearance, is symmetric in pin and pout 1291 . In the 
case where either pout or pin is pure, this is equivalent to the previous definition, and 
for mixed states it is a relatively simple expression which gives an upper bound on the 
probability of distinguishing these two states. 

Before I can continue to sketch the proof of the quantum source coding theorem, 
I need to review the proof of the classical source coding theorem. Suppose we have a 



(12) 



U = Wi (g) U2 (8) ■ • . ® 
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memoryless source, i.e., a source X that at each time step emits the i'th signal type Si 
with probabihty Pi, and where the probabihty distribution for each signal is indepen- 
dent of the previously emitted signals. The idea behind classical source coding is to 
show that with high probability, the source emits a typical sequence. Here a sequence 
of length n is defined to be typical if it contains approximately npi copies of the signal 
Si for every i} The number of typical sequences is only 2"^(-'^)+°("). These can thus 
be coded in nH{X) + o{n) bits. 

The tool that we use to perform Schumacher compression is that of typical sub- 
spaces. Suppose that we have a density matrix p G H, where H = C'^, and we take 
the tensor product of n copies of p in the space i.e., we take p*^" G C"*^. There 
is a typical subspace associated with p®". Let vi, V2, ■ ■ ■, Vk be the eigenvectors of p 
with associated eigenvalues Ai, A2, ■ ■ ■, Xk- Since Trp = 1, these Xi form a probability 
distribution. Consider typical sequences of the eigenvectors f;,;, where A,; is the proba- 
bility of choosing Vi. A typical sequence can be turned into a quantum state in 7^®" by 
taking the tensor products of its elements. That is, if a typical sequence is iii^ ,Vi^, . . ., 
Vi^ , the corresponding quantum state is w = -0^ ^ (gi ® • • • ® Vi^ . The typical subspace 
T is the subspace spanned by typical sequences of the eigenvectors. The subspace T 
has dimension equal to the number of typical sequences, or 2^^N(A))n+o(n) 

We can now explain how to do Schumacher compression. Suppose we wish to 
compress a source emitting Vi with probability Pi. Let the typical subspace corre- 
sponding to p®" be T, where p = ^ • PiVivj is the density matrix for the source, and 
where we are using a block length n for our compression scheme. We take the vector 
u = Vi^ (Si Vi2 (Si ■ ■ ■ (S> and make the von Neumann measurement that projects it into 
either T or T-*-. If u is projected onto T, we send the results of this projection to the 
receiver; this can be done with log dimT = niJvN(p) + o{n) qubits. If u is projected 
onto , our compression algorithm has failed and we can send anything; this does not 
degrade the fidelity of our transmission much, because this is a low probability event. 

Why did this work? We give a brief sketch of the proof. The main element of the 
proof is to show that the probability that we project u onto T approaches 1 as n goes 
to 00. This probability is u^Hru. If this probability were exactly 1, then u would 
necessarily be in T, and we would have noiseless compression. If this probability 
is close to 1, then u is close to the subspace T, and so u has high fidelity with the 
projected vector Ilrw. Suppose the probability that the state u is projected onto T is 
1 — e. Then Ilj- u = 1 — e and the fidelity between the original state u and the final 
state is \ {u\IItu)\^ = (1 - e)^- 

Now, recall that if two density matrices are equal, the outcomes of any experiments 
performed on them have the same probabilities. Thus, the probability that the source Vi 
with probabilities pi is projected onto the typical subspace is the same as for the source 
Vi with probabilities Aj, where iii and A^ are the eigenvalues and eigenvectors of p = 
'^iPiVivl. Because the Vi are distinguishable, this is essentially the classical case, and 
w is in the typical subspace exactly when the sequence of Vi is a typical sequence. We 
then know from the classical theory of typical sequences that w = Vi^^Vi^^ . . Vi^. 
is in the typical subspace at least 1 — e of the time, completing the proof. 

'Strictly speaking, this is the definition of frequency-typical sequences. Their is a related but distinct 
definition of entropy-typical sequences and subspaces, which can also be used in many of these proofs. 



9 



7 Accessible information and the Ci i capacity 



The next concept we consider is that of accessible information. Here, we again have a 

source emitting state cr,; with probability pi. Note that now, the states ui emitted may 
be density matrices rather than pure states. We will ask a different question this time. 
We now want to obtain as much information as possible about the sequence of signals 
emitted by the source. This is called the accessible information of the source. That is, 
the accessible information is the maximum over all measurements of the mutual infor- 
mation I{X\ Y) where X is the random variable telling which signal Oi was emitted 
by the source, and Y is the random variable giving the outcome of a measurement on 
Oi. This gives the capacity of a channel where at each time step the sender must choose 
one of the states Oi to send, and must furthermore choose Oi a fraction pi of the time; 
and where the receiver must choose a fixed measurement that he will make on every 
signal received. 

To find the accessible information, we need to maximize over all measurements. 
For this, we need to be able to characterize all possible quantum measurements. It 
turns out that von Neumann measurements are not the most general class of quan- 
tum measurements; the most general measurement is called a positive operator valued 
measure, or POVM. One way to describe these is as von Neumann measurements on 
a quantum space larger than the original space; that is, by supplementing the quantum 
state space by an ancilla space and taking a von Neumann measurement on the joint 
state space. 

We now give a more effective, but equivalent, characterization of POVM's. For 
simpUcity, we restrict our discussion to POVM's with a finite number of distinct out- 
comes; these turn out to be sufficient for studying capacities of finite dimensional chan- 
nels. A POVM can be defined by a set of positive semidefinite matrices Ei satisfying 

Ei = 7. If a quantum system has matrix p, then the probability of the i'th outcome 

is 



For a von Neumann measurement, we take Ei = Eg., the projection matrix onto the 
i'th orthogonal subspace 5j. The condition Us. = J is equivalent to the require- 
ment that the Si are orthogonal and span the whole state space. To obtain the maximum 
information from a POVM, we can assume that the Ei 's are pure states; if there is an Ei 
that is not rank one, then we can always achieve at least as much accessible information 
by refining that Ei into a sum Ei = Y^- Eij where the Eij are rank one. 

We now give some examples of the measurements maximizing accessible informa- 
tion. The first is one of the simplest examples. Suppose that we have just two pure 
states in our ensemble, with probability ^ each. For example, we could take the states 
1 1) and \ y). Let us take vi = (1, 0) and V2 = (cos 0, sin 0). We wiU not prove it 
here, but the optimal measurement for these is the von Neumann measurement with 
two orthogonal vectors symmetric around vi and V2. That is, the measurement with 
projectors 



Pi = Tr{Eip) 



(13) 



wi = (cos(f + |),sin(f + f)) 
W2 = (cos(-f + f),sin(-f + f)) 



(14) 
(15) 
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Figure 1 : A plot of the von Neumann entropy of the density matrix and the accessible 
information for the ensemble of two pure quantum states with equal probabilities and 
that differ by an angle of 9, for < 9 < 7r/2. The top curve is the von Neumann 
entropy and the bottom the accessible information. 

This measurement is symmetric with respect to interchanging vi and V2, and it leads to 
a binary symmetric channel with error probability 




The accessible information is thus 1 — i?2(^ — ^^). Here H2 is the Shannon entropy 
of a binary signal, i.e., 

H2{p) -plogp- (1 -p)log(l -p) 
For the ensemble containing vi and V2 with probability ^ each, the density matrix is 

If 1 + cos^ sin 6' cos 6' \ 
2 sin 6* cos 6* 1 - cos^ 61 j ' 

which has eigenvalues i ± cos 9, so the von Neumann entropy of the density matrix is 
H2{^ — ^^-^). The values of lacc and iJvN are plotted in Figure ^ One can see that 
the von Neumann entropy is larger than the accessible information. 

Note that in our first example, the optimum measurement was a von Neumann mea- 
surement. If there are only two states in an ensemble, it has been conjectured that the 
measurement optimizing accessible information is always a von Neumann measure- 
ment, in part because extensive computer experiments have not found a counterexam- 
ple [17J. This conjecture has been proven for quantum states in two dimensions 1331 . 
Our next example shows that this conjecture does not hold for ensembles composed of 
three or more states. 
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Our second example is three photons with polarizations that differ by 60° each. 
These are represented by the vectors 

vo = (1,0) 

The optimal measurement for these states is the POVM corresponding to the three 
vectors Wi where Wi _L Vi. We take Ei = "^wiw^^, in order for Ei ~ I. If we 
start with vector Vi, it is easy to see that we never obtain Wi, but do obtain the other 
two possible outcomes with probability i each. This gives an accessible information 
of lacc = log 3—1. For these three signal states, it is also easy to check that the density 
matrix p = i/, so i/^N = 1- Again, we have /acc < ^^vN- 

Given these two examples and some intuition, one might formulate the conjecture 
that /acc < //vN- This is true, as in fact is a somewhat stronger theorem which we will 
shortly state. The first published proof of this theorem was given by Holevo 1231 . It 
was earlier conjectured by Gordon 1 18 1 and stated by Levitin with no proof 1 32]. 

Theorem (Holevo): Suppose that we have a memoryless source emitting an ensem- 
ble of (possibly mixed) states Ui, where Ui is emitted with probability pi. Let 

X = H^nCy^piai) - y^^PiH^Njo^i)- (18) 

i i 

Then 

/acc < X- (19) 

The conditions for equality in this result are known. If all the cr^ commute, then 
they are simultaneously diagonalizable, and the situation is essentially classical. In this 
case, /acc = x; otherwise /acc < X- 

We define the Ci_i capacity of a quantum channel as the maximum over all en- 
sembles of input states of the accessible information contained by the corresponding 
ensemble of output states. This is the capacity of the channel for transmitting quantum 
information if we restrict the protocols that we use; namely, we only allow protocols 
that do not send any states that are entangled over more than one channel use (this is 
the significance of the first '1' in the subscript), and do not perform any joint quantum 
measurements involving more than one channel output (this is the significance of the 
second '1'), and further we do not allow adaptive measurements of the outputs (i.e., the 
measurement chosen cannot depend on results of previous measurements on channel 
outputs). Allowing adaptive measurements can in certain circumstances increase the 
capacity, but they do not generally allow one reach the Ci oo capacity discussed in the 
next section |48 1. 

For example, if we consider the quantum channel where the sender can choose to 
convey to the receiver either of the two pure quantum states of our first example, the 
optimum ensemble is the ensemble consisting of both states with equal probability, 
and the Ci^i capacity is 1 — //(i — 2^). For the channel which can convey to the 
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receiver any of the three states in our second example, the maximum ensemble giving 
Ci_i turns out to be that which uses just two of the three states, each with probability 
i. This is our first example with 6 = 60°, so this channel has a Ci i capacity of 

l~ H{\- .6454. 

8 The classical capacity of a quantum channel 

One can ask the question: is the Ci i capacity the most information that can be sent 
per quantum state, using only the three states of our second example? The answer is, 
surprisingly, "no". Suppose that we use the three length-two codewords V[)®vq, vi®vi, 
and V2®V2- These are three pure states in the four-dimensional quantum space of two 
qubits. Since there are only three vectors, they lie in a three-dimensional subspace. The 
inner product between any two of these states is ^ . One can show for this tensor product 
ensemble, the optimal accessible information is attained by using the von Neumann 
measurement having three basis vectors obtained by "pulling" the three vectors Vi (g) Vi 
apart until they are all orthogonal. This measurement gives /acc = 1.369 bits, larger 
than twice the Ci,i capacity, which is 1.2908 bits. We thus find that block coding and 
joint measurements let us achieve a better information transmission rate than Ci.i. 

Having found that length two codewords work better than length one codewords, 
the natural question becomes: as the lengths of our codewords go to infinity, how 
well can we do. We define the Ci.oo capacity of a quantum channel as the capacity 
over protocols which do not permit inputs entangled between two or more channel 
uses, but do allow joint quantum measurements over arbitrarily many channel uses. A 
generalization of Shannon's giving the Ci oo capacity has been proven. 

Theorem (HolevoL24J, Schumacher-Westmorelandl,^): The Ci.oo capacity of a 
quantum channel, i.e., that capacity obtainable using codewords composed of signal 
states Ui, where the probability of using Ui is pi, is 

X = -ffvN(y^pzO'i) " y^p^gvN(q-i). (20) 

i i 

Note that x is a function of the probabilistic ensemble of the signal states {ai,pi}i 
that we have chosen, where state (Ti has pi. We will sometimes write x({cri, Piji) so as 
to explicitly show this dependence. Another approach to proving this theorem, which 
also provides some additional results, appears in 1371 1391 

We later give a sketch of the proof of the Ci^oo capacity formula in the special 
case where the <Ti are pure states. We will first ask: Does this formula give the true 
capacity of a quantum channel Af7 There are certainly protocols in which the sender, 
for example, uses the two halves of an EPR pair of entangled qubits (as in Eq. (|6j) 
as inputs for two separate channel uses. The question is: does allowing this type of 
protocol let one obtain a larger capacity? 

Before we address this question (we will not be able to answer it) we should give the 
mathematical description of a general quantum channel. If A/^ is a memoryless quantum 
communication channel, then it must take density matrices to density matrices. This 
means Af must be a linear trace preserving positive map. Here, linear is required by 
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the basic principles of quantum mechanics and trace preserving is required since the 
channel must preserve trace 1 matrices. Positive means the channel takes positive 
semidefinite matrices to positive semidefinite matrices (and thus takes density matrices 
to density matrices). For A/^ to be a valid quantum map, it must have one more property: 
namely, it must be completely positive. This means that M is positive even when it is 
tensored with the identity map. There is a theorem 1.22. 1 that any linear completely 
positive map can be expressed as 

AA(p)=^A,p4, (21) 

i 

and the condition for the map to be trace-preserving is that the matrices Ai satisfy 

A natural guess at the capacity of a quantum channel M would be the maximum of 
X over all possible distributions of channel outputs, that is, 

XmaxlA/") = max x({A/'(ct<),P,},), (22) 

since the sender can effectively communicate to the receiver any of the states M{ai). 
We do not know whether this is the capacity of a quantum channel; if the use of entan- 
glement between separate inputs to the channel helps to increase channel capacity, it 
would be possible to exceed this Xmax- This can be addressed by answering a question 
that is simple to state: Is Xmax additive Q gOl |3l] EH EH 13 ? That is, if we have two 
quantum channels Mi and A/2, is 

Xmax Xmax [M). (23) 

Proving superadditivity of the quantity Xmax (i-C-, the > direction of Eq. ( I23» is easy. 
The open question is whether strictly more capacity can be attained by using the tensor 
product of two channels jointly than by using them separately. 

We now return to the discussion of the proof of the Holevo-Schumacher- Westmoreland 
theorem in the special case where the (Ji are pure states. The proof of this case in fact 
appeared before the general theorem was proved f 2 1 '| . The proof uses three ingre- 
dients. These are (1) random codes, (2) typical subspaces, and (3) the square root 
measurement. 

The square root measurement is also called the "pretty good" measurement, and 
we have already seen an example of it. Recall our second example for accessible 
information, where we took the three vectors Vi®Vi, where Vi = (cos sin for 
i — 0, 1, 2. The optimal measurement for /acc on these vectors was the von Neumann 
measurement obtained by "pulling" them farther apart until they were orthogonal. This 
is an example of the square root measurement. 

Suppose that we are trying to distinguish between vectors wi, W2, ■ ■ •, Wn, which 
appear with equal probability (the square root measurement can also be defined for 
vectors having unequal probabilities, but we do not need this case). Let if) — J2i Wiwj. 
The square root measurement has POVM elements Ei = ip^^/'^Wi'wlcj)^^/'^ . We have 
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so these Ei do indeed form a POVM. 

We can now give the coding algorithm for the capacity theorem for pure states. 
We choose M codewords uj = Vi^ Vi^^ ^ ■ ■ ■ ^ , where the Vi are chosen at 
random with probability pi. We then use these particular M codewords Uj to send 
information, where the coding scheme is chosen so that each of these codewords is 
sent with probability jj . The difficult part of the proof is now showing that a random 
codeword can be identified with high probability. 

To decode, we perform the following steps: 

1 . Project into the typical subspace T. Most of the time, this projection works, and 
we obtain wj — {u''IlTUj)^-^/^IlTUj, where Ht is the projection matrix onto 
the subspace T. 

2. Use the square root measurement on the Wj . 

The probability of error, given that the original state was Wj, is 

= 1 - w](j)-'^/^Wj 
The overall probability of error is thus 

^-li^:H<t>-'''-^^'■ (25) 

The intuition for why this procedure works (this intuition is apparently not even close 
to being rigorous, as the proof works along substantially different lines) is that for 
this probability of error to be small, we need that (j)^^^^Wj is close to wj for most 
j. However, the Wj are distributed more or less randomly in the typical subspace T, 
so = J2j Wjw] is moderately close to the identity matrix on its support, and thus 
(f)^^/^Wj is close to Wj. Note that we need that the number M of Uj is less than dimT, 
or otherwise it would be impossible to distinguish the wj; as by Holevo's bound ( I19l l a 
d-dimensional quantum state space can carry at most d bits of information. 

9 Calculating the Ci oo capacity. 

We now consider the problem of numerically finding the Ci oo capacity. Recall that 
this capacity was expressible as 

max H^t^{Af{y^p,Vivl)) ~ ^PiH^NiJ^ivivj)) 

{Pi,Vi}i ^—^ 

where the maximum is taken over all ensembles of pure states in the input 

space of the channel. We propose to maximize this in stages, at first holding one 
parameter of the ensemble fixed, and then holding other parameters of this ensemble 
fixed. 
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We first consider the problem of maximizing Ci oo while holding the average den- 
sity matrix p = PiVivj fixed. This is equivalent to the minimization problem 



minimize 



subject to y.Pi'^i'^i — P- 



(26) 



The problem of finding the probability distribution pi minimizing the expression J26I 
is a linear programming problem, albeit one with an infinite number of variables pi, 
one for each of the continuum of pure states Vi. There is a standard way of attacking 
such problems called column generation. First, the linear program is solved with Vi 
restricted to be chosen from some fixed finite set of possible signal states (these cor- 
respond to columns of the linear program). We then find vectors v which, if added to 
this linear program, would yield a better solution. By iterating the steps of finding new 
vectors to add to the linear program and solving the resulting improved linear program, 
we hope to eventually converge upon the right solution. If we are guaranteed to find a 
good vector Vi to add if one exists, then it turns out that we will indeed converge upon 
the optimal solution. 

We now give a few more details of this process. If the vectors Vi range over a 
d-dimensional input space, there are (P constraints in this problem, arising from the 
degrees of freedom of the matrix equality 



(Both Vivj and p sie d x d Hermitian matrices, yielding degrees of freedom. Note 
that J2iPi = 1 is implicit in this matrix equality, as this can be obtained by taking the 
trace of both sides.) There is thus an optimum solution which has at most d^ non-zero 
values of pi, one for each constraint of the problem. 

The success of column generation is dependent on how well we can find a column 
which will be advantageous to add to the linear program. To describe how to do this, 
we need to introduce the concept of the dual of a linear program. 

This dual is another linear program. The constraints of the first program (called the 
primal program) correspond to the variables of the dual program, and vice versa. If 
the primal program is a minimization, then the dual program is a maximization. The 
fundamental theorem of linear programming asserts that the optimum value of these 
two programs are equal. 

We now give the dual of the program i26\ above. The variables of this dual will be 
the entries of a Hermitian matrix r. The program is 

maximize Tr rp 

subject to v^Tv < i?vN(A/'(f w^)) for all unit vectors v S C''. (27) 

It may be instructive to consider the proof that the optimal value of the primal is greater 
than or equal to the optimal value of the dual. Suppose we have a r such that Tr rp ~ x. 
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Then 



Pivlrvi 

i i 

= y^piTrrVivl 

i 

= Trrp = X. (28) 

Suppose we have an optimal solution of the primal program restricted to a given fixed 
set of variables Vi. This solution will correspond to a dual problem with some optimal 
Hermitian matrix t. To find a good column to add to the primal, we need to find a v 
that violates the constraints of the dual, i.e., such that 

77vN (7V(ww^ ) ) - i; < (29) 

This is a nonlinear optimization problem which 1 believe should be solvable in low 
dimensions by gradient descent. One can look for good vectors to add by starting at an 
arbitrary vector v and proceeding downhill to find local minima of the expression (I29> . 
One should start both at random points, and at the vectors Vi having nonzero proba- 
bility in the current solution (these latter will improve the solution by adapting to the 
perturbation made to the problem since the last iteration). For high dimensions, this 
technique will grow inefficient exponentially fast, but the curse of dimensionality also 
may mean that the problem is intrinsically difficult. 

We now need to show how to change the average density matrix p of our ensemble 
so as to improve the optimal value. Recall that we want to maximize 

HMJ^ip)) - J2p^HM^iv^vl)). (30) 

i 

By the linear programming duality ( I28> , this is smaller than 

i7vN(AA(p))-Trpr, (31) 

for an arbitrary ensemble {pi, Vi}i, and equal at the current maximum. From the con- 
cavity of entropy, the expression J31> is concave in p. Thus, if there is no direction 
to change p that will increase the maximum i3l\ . there is also no direction that will 
increase j30L and we have found the optimal value of p (at least for our current set of 
Vi). If there is a direction that increases ( 13 U . then we can use binary search to find the 
optimum distance to move p in that direction. For the complete linear program with a 
continuum of variables Vi, we can use a smoothness argument to show that this same 
direction will also increase the objective function. This argument, unfortunately, does 
not appear to carry over to the finite dimensional linear program on a fixed set of Vi 
that we actually are solving. If things work well, it may turn out that attempting to 
move in this direction will result in a procedure that always converges to the optimum. 
Otherwise, we may have to use the polyhedral structure of the solution to our finite 
dimensional linear program to discover a good direction to move p. 

The derivatives for Eqs. ( I29l l and ( I31> can both be calculated explicitly. This can 
be done using the estimate 

H{p + eA) = H{p - eXrA log p + ©(e^) 
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which holds for matrices with Tip = 1 and TrA = 0. It can be derived from the 
integral expression for log(p + eS) given in Eq. (20) of f42\. 

We thus propose to find the value of Ci_oo numerically by iterating the following 
steps 

1. With a fixed set of Vi, and the constraint '^^PiVivj = p, solve the linear pro- 
gram i26i . 

2. Change p so as to improve the value of the linear program solution above. 

3. Find a quantum states v corresponding to columns it would be advantageous to 
add to the linear program. 

If none of the steps can improve the solution, then we have discovered the optimum 
value of Ci oo- The hard step is (3); this is a non-linear optimization problem for 
which we have no good criteria to test whether we have discovered the global optimum. 
In large dimensions, this will clearly be the bottleneck step in making the procedure 
impractical. 



We now very briefly describe how the ideas of the last section might be used to give 
a heuristic procedure for finding the Ci i capacity. Our technique again uses column 
generation. This time we propose to alternate between optimizing the ensemble of 
signal states used for the input, and optimizing the measurement used by the receiver. 
This is not guaranteed to find the global optimum; the second example of section0has 
four different local optima that are stable points in this procedure; the three ensembles 
each containing two of the signal states with probability i, and the ensemble containing 



To find the optimal signal states, given a fixed measurement, we can in fact use 
exactly the procedure given in the previous section. By fixing the measurements, we 
have defined a quantum channel, with the input being a quantum state and the output 
being the results of the measurement, and so the procedure of the previous section is 
applicable. This is not an arbitrary quantum channel, as the output state is classical. 
Unfortunately, at this point we do not see how to use this fact to simplify the calculation 
of the capacity. 

Finding the optimal measurement, given the signal states, can be done using essen- 
tially the same ideas as in the previous section. The measurement we use must extract 
a maximal amount of information, and so we can assume that each Ej takes the form 
Ei = QiWiivj for some unit quantum state vector Wi in the output space. The condition 
that these form a POVM is 



Now, the expression for the capacity is the entropy of the input minus the entropy 
of the input, given the output. This can be seen to be linear in qi. Again, we have 
an infinite dimensional linear program, which we can solve using the technique of 



10 Calculating the Ci i capacity. 



all three with probability 



iEHl. 




18 



column generation. In this case, it is even slightly simpler; because the constraints are 
J2i QiWiwj = I, we do not need to incorporate any additional steps of optimizing p. 



11 Entanglement-assisted capacities 

In this section, we define the entanglement-assisted capacity of a quantum channel, and 
give the expression for it. For motivation, we first describe two surprising phenomena 
in quantum communication: superdense coding and quantum teleportation. 

The process of superdense coding uses a shared EPR pair and a single qubit to 
encode two classical bits 1 11 1. This is an improvement on the capacity of a noiseless, 
unassisted, quantum channel, which takes one quantum bit to send a classical bit. We 
will assume that the shared EPR pair is in the state 

^(|oi>-|io» 

where the sender holds the first qubit and the receiver holds the second qubit. In this 
protocol, the sender starts by taking an EPR pair and applying to it either the identity 
operation or one of the three Pauli matrices 









^ 




















.. = 1 


V 





He then sends his qubit to the receiver The receiver now holds one of the four quantum 
states 

^(|01)-|10)), i=(|ll)-|00)), 

^(|ii> + |oo)), -L(|oi) + |io)). 

These four states are known as the Bell basis, and they are mutually orthogonal. The 
receiver can thus uniquely identify which of these states he has, and so can unambigu- 
ously identify one of four messages, or two bits. (See Figure|2]) 

There is a converse process to superdense coding known as quantum teleportation. 
It is impossible to send a quantum state over an unassisted classical channel. However, 
quantum teleportation lets a sender and a receiver who share an EPR pair of qubits 
communicate one qubit by sending two classical bits and using this EPR pair |8J. (See 
Figure |3]) In quantum teleportation, the sender measures the unknown quantum state 
state and the EPR pair in the Bell basis, and sends the receiver the two classical bits 
which are the results of this measurement. The receiver then performs a unitary op- 
eration. The measurements the sender makes are the same ones the receiver makes in 
superdense coding, and the unitary transformations the receiver performs are those the 
sender performs in superdense coding. 

Quantum teleportation is a counterintuitive process, which at first sight seems to 
violate certain laws of physics; however, upon closer inspection one discovers that no 
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Receiver 




bits 



EPR pair 
of qubits 

Figure 2: A schematic drawing of superdense coding. The sender can communicate 
two classical bits to the receiver using one qubit and a shared EPR pair. Here, the 
sender makes the same unitary transformation that the receiver would make in quantum 
teleportation, and the receiver makes the joint measurement that the sender would make 
in quantum teleportation. 

actual paradoxes arise from teleportation. Teleportation cannot be used for superlumi- 
nal communication, because the classical bits must travel at or slower than the speed 
of light. A continuous quantum state, which at first sight appears to contain many 
more than two bits of information, appears to have been transported using two dis- 
crete bits; however, by Holevo's bound, Eq. ( I19> , one qubit can be used to transport 
at most one classical bit of information, so it is not possible to increase the capacity 
of a classical channel by encoding information in a teleported qubit. Finally, there is 
a theorem of quantum mechanics that an unknown quantum state cannot be duplicated 
(52 1 . However, this no-cloning theorem (as it is known) is not violated; the original 
state is necessarily destroyed by the measurement, so teleportation cannot be used to 
clone a quantum state. 

We now give another capacity for quantum channels, one which has a capacity for- 
mula which can actually be completely proven, even in the case of infinite-dimensional 
Hilbert spaces I9l ll0ll25ll26l . Recall that if A/^ is a noiseless quantum channel, and if 
the sender and receiver possess shared EPR pairs, they can use superdense coding to 
double the classical information capacity of M. Similarly, if A/" is a noiseless classical 
channel, EPR pairs can increase the capacity of the channel to send quantum informa- 
tion from zero qubits to half a qubit per channel use. 

In general, if 7V^ is a noisy quantum channel, using shared EPR pairs can increase 
both the classical and quantum capacities of M. With the aid of entanglement, the 
capacity for sending quantum information becomes exactly half of the capacity for 
classical information (this is a direct consequence of the phenomena of superdense 
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, ' state \|( 




Qubit in 
unknown 
state \|/ 

EPR pair 
of qubits 

Figure 3: A schematic drawing of quantum teleportation. The sender has a qubit in an 
unknown state ?/' that he wishes to send to the receiver. He also has half of an EPR 
state which he shares with the receiver. The sender makes a joint measurement on the 
unknown qubit and half of his EPR state, and communicates the results (2 classical bits) 
to the receiver. The receiver then makes one of four unitary transformations (depending 
on the two classical bits he received) on his half of the EPR state to obtain the state ^. 



coding and teleportation). We define the entanglement assisted capacity, C^, as the 
quantity of classical information that can asymptotically be sent per channel use if the 
sender and receiver have access to a sufficient quantity of shared entanglement. 

Theorem (Bennett, Shor, Smolin, Thapliyal (9] 1101 '): The entanglement assisted 
capacity is 

Ce{M) = max H,^{p) + HMi^fip)) - HM{M ®T){^p)) (32) 

where p £ TLm is a density matrix over the input space. Here, $p is a pure state over 
the tensor product space TL\n ® Ti-R such that Tr^^p = p. Here Tii-a is the input state 
space and TLr is a reference system. The third term of the right hand side of \32\ . 
i?vN((A/' ® X)(<i>p)), is the entropy of the state resulting after the first half of '^p is 
sent through the channel Af, while the identity operation is applied to second half. The 
value of this term is independent of which reference system Hr and which pure state 
$p are chosen. 

The quantity being minimized in the above formula J32t is sometimes called quan- 
tum mutual information, and it is a generalization of the expression for mutual informa- 
tion in the form of Eq. Q. The proof of this result uses typical subspaces, superdense 
coding, the Holevo-Schumacher- Westmoreland theorem on the classical capacity of a 
quantum channel, and the strong subadditivity property of von Neumann entropy. The 
entanglement assisted capacity is a convex function of Tr sp, and so can be maximized 
with a straightforward application of gradient descent. 
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In the entanglement-assisted capacity, the communication protocol consumes the 
resource of entanglement. In general, it takes i/vN(p) bits of entanglement (i.e., EPR 
pairs) per channel use to achieve the capacity CEiM) in Eq. I have also conjec- 
tured a formula for the capacity of a quantum channel using protocols which are only 
allowed to use a limited amount of entanglement. This formula is as follows. 

Conjecture: If the available entanglement per channel use is B bits, the capacity 
available is 

max ^p,H^T^{pi)+H^t^{{Af{J2p,pi))-Y^p,H^T^{{Af(E)I){<i>pJ). (33) 

Y.iPiH{Pi)<B i i i 

where Tr2<&p. = Pi, as in Eq. \32\ . Here, the maximization is over all probabilistic 
ensembles of density matrices {pi,Pi\i where pi € Ti-in, '^{Pi = 1. ond the average 
entropy of the ensemble, ^iPiH^^{pi), is at most B. 

I have a protocol which achieves this bound, and I can prove a matching upper bound 
over a restricted class of protocols. 

A heuristic for finding this capacity with assistance by limited entanglement can be 
constructed using linear programming, column generation, and non-linear optimiza- 
tion, along the same lines as the protocol of Section |9] An extra constraint must be 
added that bounds the average entropy of the ensemble. One additional difficulty wiU 
be that the non-linear optimization problem needed to find good columns to add ap- 
pears to become substantially harder. 

12 Sending Quantum Information 

Finally, we briefly mention the problem of sending quantum information (i.e., a quan- 
tum state) over a noisy quantum channel. In this scenario, several of the theorems that 
make classical channel capacity behave so nicely are demonstrably not true. Here, a 
feedback channel from the receiver to the sender, or a classical two-way side channel, 
will increase the quantum channel capacity, leading to several different capacities for 
transmitting quantum information. For the two-way quantum capacity, Q2, the sender 
and receiver have a classical side channel they can use for free. For the quantum ca- 
pacity with feedback, Qfb < Q2, the receiver has a classical feedback channel from 
himself to the sender. For the one-way quantum capacity, Q < Qj^s, all communica- 
tion is directly from the sender to the receiver over the noisy quantum channel JV. The 
quantities Q2 is closely related to a quantity defined on quantum states called the dis- 
tillable entanglement 1 13 1. Despite substantial study, not only do we not have any good 
ways to compute either Q2 01 Qfb, but we also have no simple capacity formulas for 
representing it. There is a capacity formula for the one-way quantum capacity Q. It is 
essentially the last two terms of the expression for entanglement-assisted capacity 

Q(AA) = lim - max H^^'^^ip)) - H^i^^^' (E)I)^p) (34) 

where p, Hin and are defined as in ( I32> . The quantity being maximized (before the 
limit n 00) is called the coherent information. We need to take the maximum over 
the tensor product of n uses of the channel, and let n go to infinity, because unlike the 
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classical (or the quantum) mutual information, the coherent information is not additive 
H6i . The quantity Q(N) ( I34> is the quantum capacity of a noisy quantum channel 
N 1341 151 IT71 1501 . Even for maximizing the single-symbol expression (that is, taking 
n = 1 in Eq. (|34}), the calculation of the coherent information appears to be a difficult 
optimization problem, as there may be multiple local maxima. It would be a significant 
accomplishment to discover a good means of calculating this; unfortunately, I do not 
have any useful suggestions. 
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